{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "images.ipynb",
      "provenance": [],
      "private_outputs": true,
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.4"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "mt9dL5dIir8X"
      },
      "source": [
        "##### Copyright 2019 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "form",
        "colab_type": "code",
        "id": "ufPx7EiCiqgR",
        "colab": {}
      },
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License.\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ucMoYase6URl"
      },
      "source": [
        "# Load images"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "_Wwu5SXZmEkB"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/load_data/images\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/load_data/images.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/load_data/images.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/docs/site/en/tutorials/load_data/images.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Oxw4WahM7DU9"
      },
      "source": [
        "This tutorial provides a simple example of how to load an image dataset using `tf.data`.\n",
        "\n",
        "The dataset used in this example is distributed as directories of images, with one class of image per directory."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "hoQQiZDB6URn"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "3vhAMaIOBIee",
        "colab": {}
      },
      "source": [
        "from __future__ import absolute_import, division, print_function, unicode_literals"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "QGXxBuPyKJw1",
        "colab": {}
      },
      "source": [
        "try:\n",
        "  # %tensorflow_version only exists in Colab.\n",
        "  !pip install tf-nightly\n",
        "except Exception:\n",
        "  pass\n",
        "import tensorflow as tf"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "KT6CcaqgQewg",
        "colab": {}
      },
      "source": [
        "AUTOTUNE = tf.data.experimental.AUTOTUNE"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "gIksPgtT8B6B",
        "colab": {}
      },
      "source": [
        "import IPython.display as display\n",
        "from PIL import Image\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "import os"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "ZJ20R66fzktl",
        "colab": {}
      },
      "source": [
        "tf.__version__"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "wO0InzL66URu"
      },
      "source": [
        "### Retrieve the images\n",
        "\n",
        "Before you start any training, you will need a set of images to teach the network about the new classes you want to recognize. You can use an archive of creative-commons licensed flower photos from Google.\n",
        "\n",
        "Note: all images are licensed CC-BY, creators are listed in the `LICENSE.txt` file."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "rN-Pc6Zd6awg",
        "colab": {}
      },
      "source": [
        "import pathlib\n",
        "data_dir = tf.keras.utils.get_file(origin='https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',\n",
        "                                         fname='flower_photos', untar=True)\n",
        "data_dir = pathlib.Path(data_dir)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "rFkFK74oO--g"
      },
      "source": [
        "After downloading (218MB), you should now have a copy of the flower photos available.\n",
        "\n",
        "The directory contains 5 sub-directories, one per class:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "QhewYCxhXQBX",
        "colab": {}
      },
      "source": [
        "image_count = len(list(data_dir.glob('*/*.jpg')))\n",
        "image_count"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "sJ1HKKdR4A7c",
        "colab": {}
      },
      "source": [
        "CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != \"LICENSE.txt\"])\n",
        "CLASS_NAMES"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "IVxsk4OW61TY"
      },
      "source": [
        "Each directory contains images of that type of flower. Here are some roses:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "crs7ZjEp60Ot",
        "colab": {}
      },
      "source": [
        "roses = list(data_dir.glob('roses/*'))\n",
        "\n",
        "for image_path in roses[:3]:\n",
        "    display.display(Image.open(str(image_path)))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "6jobDTUs8Wxu"
      },
      "source": [
        "## Load using `keras.preprocessing`"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ehhW308g8soJ"
      },
      "source": [
        "A simple way to load images is to use `tf.keras.preprocessing`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "syDdF_LWVrWE",
        "colab": {}
      },
      "source": [
        "# The 1./255 is to convert from uint8 to float32 in range [0,1].\n",
        "image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "lAmtzsnjDNhB"
      },
      "source": [
        "Define some parameters for the loader:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "1zf695or-Flq",
        "colab": {}
      },
      "source": [
        "BATCH_SIZE = 32\n",
        "IMG_HEIGHT = 224\n",
        "IMG_WIDTH = 224\n",
        "STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "Pw94ajOOVrWI",
        "colab": {}
      },
      "source": [
        "train_data_gen = image_generator.flow_from_directory(directory=str(data_dir),\n",
        "                                                     batch_size=BATCH_SIZE,\n",
        "                                                     shuffle=True,\n",
        "                                                     target_size=(IMG_HEIGHT, IMG_WIDTH),\n",
        "                                                     classes = list(CLASS_NAMES))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "2ZgIZeXaDUsF"
      },
      "source": [
        "Inspect a batch:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "nLp0XVG_Vgi2",
        "colab": {}
      },
      "source": [
        "def show_batch(image_batch, label_batch):\n",
        "  plt.figure(figsize=(10,10))\n",
        "  for n in range(25):\n",
        "      ax = plt.subplot(5,5,n+1)\n",
        "      plt.imshow(image_batch[n])\n",
        "      plt.title(CLASS_NAMES[label_batch[n]==1][0].title())\n",
        "      plt.axis('off')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "suh6Sjv68rY3",
        "colab": {}
      },
      "source": [
        "image_batch, label_batch = next(train_data_gen)\n",
        "show_batch(image_batch, label_batch)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "AxS1cLzM8mEp"
      },
      "source": [
        "## Load using `tf.data`"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Ylj9fgkamgWZ"
      },
      "source": [
        "The above `keras.preprocessing` method is convienient, but has two downsides: \n",
        "\n",
        "1. It's slow. See the performance section below.\n",
        "1. It lacks fine-grained control.\n",
        "1. It is not well integrated with the rest of TensorFlow."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "IIG5CPaULegg"
      },
      "source": [
        "To load the files as a `tf.data.Dataset` first create a dataset of the file paths:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "lAkQp5uxoINu",
        "colab": {}
      },
      "source": [
        "list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "coORvEH-NGwc",
        "colab": {}
      },
      "source": [
        "for f in list_ds.take(5):\n",
        "  print(f.numpy())"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "91CPfUUJ_8SZ"
      },
      "source": [
        "Write a short pure-tensorflow function that converts a file paths to an (image_data, label) pair:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "arSQzIey-4D4",
        "colab": {}
      },
      "source": [
        "def get_label(file_path):\n",
        "  # convert the path to a list of path components\n",
        "  parts = tf.strings.split(file_path, os.path.sep)\n",
        "  # The second to last is the class-directory\n",
        "  return parts[-2] == CLASS_NAMES"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "MGlq4IP4Aktb",
        "colab": {}
      },
      "source": [
        "def decode_img(img):\n",
        "  # convert the compressed string to a 3D uint8 tensor\n",
        "  img = tf.image.decode_jpeg(img, channels=3)\n",
        "  # Use `convert_image_dtype` to convert to floats in the [0,1] range.\n",
        "  img = tf.image.convert_image_dtype(img, tf.float32)\n",
        "  # resize the image to the desired size.\n",
        "  return tf.image.resize(img, [IMG_WIDTH, IMG_HEIGHT])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "-xhBRgvNqRRe",
        "colab": {}
      },
      "source": [
        "def process_path(file_path):\n",
        "  label = get_label(file_path)\n",
        "  # load the raw data from the file as a string\n",
        "  img = tf.io.read_file(file_path)\n",
        "  img = decode_img(img)\n",
        "  return img, label"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "S9a5GpsUOBx8"
      },
      "source": [
        "Use `Dataset.map` to create a dataset of `image, label` pairs:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "3SDhbo8lOBQv",
        "colab": {}
      },
      "source": [
        "# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.\n",
        "labeled_ds = list_ds.map(process_path, num_parallel_calls=AUTOTUNE)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "kxrl0lGdnpRz",
        "colab": {}
      },
      "source": [
        "for image, label in labeled_ds.take(1):\n",
        "  print(\"Image shape: \", image.numpy().shape)\n",
        "  print(\"Label: \", label.numpy())"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "vYGCgJuR_9Qp"
      },
      "source": [
        "### Basic methods for training"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "wwZavzgsIytz"
      },
      "source": [
        "To train a model with this dataset you will want the data:\n",
        "\n",
        "* To be well shuffled.\n",
        "* To be batched.\n",
        "* Batches to be available as soon as possible.\n",
        "\n",
        "These features can be easily added using the `tf.data` api."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "uZmZJx8ePw_5",
        "colab": {}
      },
      "source": [
        "\n",
        "\n",
        "def prepare_for_training(ds, cache=True, shuffle_buffer_size=1000):\n",
        "  # This is a small dataset, only load it once, and keep it in memory.\n",
        "  # use `.cache(filename)` to cache preprocessing work for datasets that don't\n",
        "  # fit in memory.\n",
        "  if cache:\n",
        "    if isinstance(cache, str):\n",
        "      ds = ds.cache(cache)\n",
        "    else:\n",
        "      ds = ds.cache()\n",
        "\n",
        "  ds = ds.shuffle(buffer_size=shuffle_buffer_size)\n",
        "\n",
        "  # Repeat forever\n",
        "  ds = ds.repeat()\n",
        "\n",
        "  ds = ds.batch(BATCH_SIZE)\n",
        "\n",
        "  # `prefetch` lets the dataset fetch batches in the background while the model\n",
        "  # is training.\n",
        "  ds = ds.prefetch(buffer_size=AUTOTUNE)\n",
        "\n",
        "  return ds"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "-YKnrfAeZV10",
        "colab": {}
      },
      "source": [
        "train_ds = prepare_for_training(labeled_ds)\n",
        "\n",
        "image_batch, label_batch = next(iter(train_ds))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "UN_Dnl72YNIj",
        "colab": {}
      },
      "source": [
        "show_batch(image_batch.numpy(), label_batch.numpy())"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "UMVnoBcG_NlQ"
      },
      "source": [
        "## Performance\n",
        "\n",
        "Note: This section just shows a couple of easy tricks that may help performance. For an in depth guide see [Input Pipeline Performance](../../guide/performance/datasets)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "oNmQqgGhLWie"
      },
      "source": [
        "To investigate, first here's a function to check the performance of our datasets:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "_gFVe1rp_MYr",
        "colab": {}
      },
      "source": [
        "import time\n",
        "default_timeit_steps = 1000\n",
        "\n",
        "def timeit(ds, steps=default_timeit_steps):\n",
        "  start = time.time()\n",
        "  it = iter(ds)\n",
        "  for i in range(steps):\n",
        "    batch = next(it)\n",
        "    if i%10 == 0:\n",
        "      print('.',end='')\n",
        "  print()\n",
        "  end = time.time()\n",
        "\n",
        "  duration = end-start\n",
        "  print(\"{} batches: {} s\".format(steps, duration))\n",
        "  print(\"{:0.5f} Images/s\".format(BATCH_SIZE*steps/duration))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "TYiOr4vdLcNX"
      },
      "source": [
        "Let's compare the speed of the two data generators:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "85Yc-jZnVjvm",
        "colab": {}
      },
      "source": [
        "# `keras.preprocessing`\n",
        "timeit(train_data_gen)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "IjouTJadRxyp",
        "colab": {}
      },
      "source": [
        "# `tf.data`\n",
        "timeit(train_ds)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ZB2TjJR62BJ3"
      },
      "source": [
        "A large part of the performance gain comes from the use of `.cache`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "Oq1V854E2Nh4",
        "colab": {}
      },
      "source": [
        "uncached_ds = prepare_for_training(labeled_ds, cache=False)\n",
        "timeit(uncached_ds)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "-JCHymejWSPZ"
      },
      "source": [
        "If the dataset doesn't fit in memory use a cache file to maintain some of the advantages:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "RqHFQFwxWNbu",
        "colab": {}
      },
      "source": [
        "filecache_ds = prepare_for_training(labeled_ds, cache=\"./flowers.tfcache\")\n",
        "timeit(filecache_ds)"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}